Local Behavior of Sparse Analysis Regularization: 
Applications to Risk Estimation 

Samuel Vaiter, Charles- Alban Deledalle, Gabriel Peyre 

CEREMADE, CNRS, Universite Paris-Dauphine, Place du Marechal De Lattre De 
Tassigny, 75775 Paris Cedex 16, France 

Charles Dossal 

1MB, Universite Bordeaux 1, 351, Cours de la Liberation, 33405 Talence Cedex, France 

Jalal Fadili 

GREYC, CNRS-ENSICAEN-Universite de Caen, 6, Bd du Marechal Juin, 14050 Caen 

Cedex, France 



Abstract 

In this paper, we aim at recovering an unknown signal xq, from noisy mea- 
surements y = $xo + w, where 3> is an ill-conditioned or singular linear operator 
and w accounts for some noise. To regularize such an ill-posed inverse problem, 
we impose an analysis sparsity prior. More precisely, the recovery is cast as a 
convex optimization program where the objective is the sum of a quadratic data 
fidelity term and a regularization term formed of the ^-norm of the correlations 
between the sought after signal and atoms in a given (generally overcomplete) 
dictionary. The £ 1 -sparsity analysis prior is weighted by a regularization pa- 
rameter A > 0. In this paper, we prove that any minimizcrs of this problem is a 
piecewise-affine function of the observations y and the regularization parameter 
A. As a byproduct, we exploit these properties to get an objectively guided 
choice of A. In particular, we develop an extension of the Generalized Stein 
Unbiased Risk Estimator (GSURE) and show that it is an unbiased and reliable 
estimator of an appropriately defined risk. The latter encompasses special cases 
such as the prediction risk, the projection risk and the estimation risk. We apply 
these risk estimators to the special case of ^-sparsity analysis regularization. 
We also discuss implementation issues and propose fast algorithms to solve the 
i 1 analysis minimization problem and to compute the associated GSURE. We 
finally illustrate the applicability of our framework to parameter(s) selection on 
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several imaging problems. 
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1. Introduction 

1.1. Regularization of Linear Inverse Problems 

In many applications, the goal is to recover an unknown signal xq £ 
from noisy and linearly degraded observations y £ MP . The forward observation 
model reads 

y = $x + w, (1) 

where w £ is the noise component and the mapping $ : R N — > MP is a known 
linear operator which generally models an acquisition process that entails loss 
of information so that Q ^ N. Even when Q — N, $ is typically ill-conditioned 
or even rank-deficient. In image processing, typical applications covered by 
the above degradation model are entry- wise masking (inpainting), convolution 
(acquisition blur), Radon transform (tomography) or a random sensing matrix 
(compressed sensing). 

Solving for an accurate approximation of Xq from the system is generally 
ill-posed PP. In order to regularize them and reduce the space of candidate 
solutions, one has to incorporate some prior knowledge on the typical structure 
of the original object Xq. This prior information accounts for the smoothness 
of the solution and can range from uniform smoothness assumption to more 
complex geometrical priors. 

Regularization is a popular way to impose such a prior, hence making the 
search for solutions feasible. The general variational problem we consider can 
be stated as 

x\(y) £ Argmin F(x, y) + XR(x), (2) 

where F is the so-called data fidelity term, R is an appropriate regularization 
term that encodes the prior on the sought after signal, and A > a regularization 
parameter. This parameter balances between the amount of allowed noise and 
the regularity dictated by R. In this paper, we consider a quadratic data fidelity 
term taking the form 

F(x,y) = ~\\y-$xf 2 . (3) 

If it were to be interpreted in Bayesian language, this data fidelity would amount 
to assuming that the noise is white Gaussian. 

The popular Thikonov class of regularizations corresponds to quadratic 
forms R(x) = (x, Kx), where K is a symmetric semidefinite positive kernel. 
This typically induces some kind of uniform smoothness on the recovered vec- 
tor. To capture the more intricate geometrical complexity of image structures, 
non-quadratic priors are required, among which sparse regularization through 
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the t norm has received a considerable interest in the recent years. This non- 
smooth regularization is at the heart of this paper. 



1.2. Sparse i 1 -Analysis Regularization 

We call a dictionary D — (di)f =1 a collection of P atoms di 6 K. . The 
dictionary may be redundant in R N , in which case P > N and $ if surjective if 
it has full row rank. D can also be viewed as a linear mapping from K p to R. 
which is used to synthesize a signal x £ Span(D) C M. N asi = Da — ^ i=1 CKjdj, 
where a is not uniquely defined if D is a redundant dictionary. 

The £ analysis regularization in a dictionary D corresponds to using R = Ra 
in ([2]) where 

R A (x) = \\D*x\\ 1 . (4) 

This leads us to the following minimization problem which is the focus of this 
paper 

x\{y) e Argmin \ \\y - $x\\ 2 2 + A . (V x (y)) 

Since the objective function in j^^g/)} is proper (i.e. not infinite everywhere), 
continuous and convex, the set of (global) minimizers of ( [Pa(2/)| I is nonempty 
and compact if, and only if, 

Kcr$nKerL>* = {0}, (H a ) 

All throughout this paper, we suppose that this condition holds. 

The most popular i l -analysis sparsity-promoting regularization is the to- 
tal variation, which was first introduced for denoising (in a continuous setting) 
in [2]. In a discrete setting, it corresponds to taking D* as a finite difference dis- 
cretization of the gradient operator. The corresponding prior Ra favors piece- 
wise constant signals and images. A comprehensive review of total variation 
regularization can be found in [3]. 

The theoretical properties of total variation regularization have been previ- 
ously investigated. A distinctive feature of this regularization is its tendency to 
yield a staircasing effect, where discontinuities not present in the original data 
might be artificially created by the regularization. This effect has been studied 
by Nikolova in the discrete case in a series of papers, see e.g. [3], and by Ring 
in [5] in the continuous setting. The stability of the discontinuity set of the 
solution of the 2-D continuous total variation denoising is studied in [B]. 

When D is the standard basis, i.e. D = Id, the analysis sparsity regulariza- 
tion Ra specializes to the so-called synthesis regularization. The corresponding 



variational problem (V\(y) I is referred to as the Lasso problem in the statistics 
community [7] and Basis-Pursuit DeNoising (BPDN) in the signal processing 
community [5]. Despite synthesis and analysis regularizations both minimize 
objective functions involving the i 1 norm, the properties of their respective min- 
imizers may differ significantly. Some insights on the relation and distinction 
between analysis and synthesis-based sparsity regularizations were first given in 
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[5] . When D is orthogonal, and more generally when D is square and invertible, 
analysis and synthesis regularizations are equivalent in the sense that the set of 
minimizers of one problem can be retrieved from that of an equivalent form of 
the other through a bijective change of variable. However, when D is redundant, 
synthesis and analysis regularizations depart significantly. 

While the theoretical guarantees of synthesis ^-regularization have been 
extensively studied, the analysis case remains much less investigated (TU1 fTTl 

nana. 

1.3. Geometrical Insights into t l - Analysis Regularization 

In the synthesis prior, sparsity of a vector a e R p is measured in terms of 
its £ pseudo-norm, or equivalently the cardinality of its support supp(a), i.e. 

|M| = | supp(a)| = | {i G {1, • • • ,P] \ a t ? 0} |. 

In the analysis prior, the sparsity is measured on the correlation vector D*x. 
It then appears natural to keep track of the support of D*x. To fix terminology, 
we define this support and its complement. 

Definition 1. The D-support I (resp. £>-cosupport J) of a vector x £ M. N is 
defined as I = supp(_D*a;) (resp. J = I c = {1, . . . , P} \ I). 

A vector x with a D-cosupport J is then such that the correlations between 
this vector and the columns of Dj are zero. This is equivalent to saying that x 
lives in a subspace Qj defined as follows. 

Definition 2. Given J a subset of {1 ■ • • P}, the cospace Qj is defined as 

Qj = KerD*j. 

It was shown in |13j that the subspace Qj plays a pivotal role in robustness 
and identifiability guarantees of \P\ (y)\ . 

In fact, the subspaces Qj carry all necessary information to characterize 
signal models of sparse analysis type. More precisely, vectors of cosparsity 
k = | J\ live in an union of subspaces 

e fc = {Qj \ J C {1 • ■ ■ P} and dim^j - k} , 

and the signal space can be decomposed as M. N = Ufce{o n} This 
model has been introduced in |12j under the name cosparse model. 

For synthesis sparsity, i.e. D = Id, 9^ are nothing but the set of axis- 
aligned subspaces of dimension k. For the 1-D total variation prior, where D 
corresponds to finite forward differences, Ok is the set of piecewise constant 
signals with k — 1 steps. A few other examples of subspaces 0fc, including those 
corresponding to translation invariant wavelets, are discussed in |12j . More 
general union of subspaces models have been introduced in sampling theory to 
model various types of non- linear signal ensembles, see e.g. [2] . 
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1.4- Local Behavior of Minimizers 

Local variations and sensitivity/perturbation analysis of problems in the 
form of (|) is an important topic in optimization and optimal control. Compre- 
hensive monograph on the subject are [151 116) . In this paper, we focus on the 
variations with respect to the regularization parameter A and the observations 
y, i.e we study the set-valued mapping (X,y) ^ M.\{y) where M.\(y) is the set 
of minimizers of ([2]) . 

In the synthesis case (D = Id) with Q > N, the work of P~7] [TB] showed 
that, for a fixed y, the mapping A i-> x\(y) is piecewise affine, i.e. the solution 
path is polygonal. Further, they characterized changes in the solution x\{y) at 
vertices of this path. Based on these observations, they presented the homotopy 
algorithm, which follows the solution path by jumping from vertex to vertex 
of this polygonal path. This idea was extended to the underdetermined case 
in pi|l [20] ■ A homotopy- type scheme was proposed in [ST] for sparse i 1 -analysis 
regularization in the overdetermined case (Q > N). We will discuss the latter 
work in more detail in Section [4] 

1.5. Risk Estimation and Parameter Selection 

This paper also addresses unbiased estimation of the ^ 2 -risk when recovering 
a vector xq € from the measurements y in ([I]), e.g. by solving ([2]), under the 
assumption that w is white Gaussian noise. A central concept for risk estimation 
is that of the degrees of freedom (DOF). Let xg(y) be an estimator of Xq from 
([I]), parameterized by some parameters 9. The DOF of such an estimator was 
defined by Efron [32] as 



The DOF is generally used to quantify the complexity of a statistical modeling 
procedure. It plays a central role in many model validation and selection criteria, 
e.g. Mallows' C p (Mallows [53]), AIC (Akaike information criterion [Mj), BIC 
(Bayesian information criterion |25j, GCV (generalized cross-validation |26| ) 
or SURE (Stein Unbiased Risk Estimator [22]). In the spirit of the SURE 
theory, a good unbiased estimator of the DOF is sufficient to provide an un- 
biased estimate of the £ 2 risk in reconstructing $Xo, i.e. the prediction risk 
E w (\\$xe(y) — &Xo\\ 2 ). For instance, the SURE is given by 




s\JBE(x g (y)) = \\y 



<5>x e {y)\\l-Q ( j 2 + 2a 2 df {y) 



(5) 



with 





is the Jacobian matrix of the vector function 



y h-> Qxg (y) and tr is the trace operator. 
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The SURE depends solely on y, without prior knowledge of xq. This can 
prove very useful as a basis to automatically choose the parameters 9 of the 
estimator, e.g. 6 — A when solving |2]), or the smoothing parameters in families 
of linear estimates [28J such as for ridge regression or smoothing splines. In 
some settings, it has been shown that it offers better accuracy than GCV and 
related non-parametric selection techniques However, compared to GCV, 
the SURE requires the knowledge of the noise variance a 2 . 

The SURE has been widely used in the statistics and signal processing com- 
munities as a principled and efficient way for parameter selection with a vari- 
ety of non-linear estimators. For instance, it was used for wavelet denoising 
[301 I3TI 152] . wavelet shrinkage for a class of linear inverse problems [33J and 
non-local filtering [341135] 135], 

For general linear inverse problems, the estimator of the prediction risk and 
the paramctcr(s) minimizing it can depart substantially from those correspond- 
ing to the estimation risk E w (\\xg(y) — xo\\ 2 ) [37]. To circumvent this difficulty, 
in |38] , the authors attempted to approximate the estimation risk by relying on 
a regularized version of the inverse of $. However, in general, either $ should 
have a trivial kernel or, otherwise, xq should live outside to ker($) to guarantee 
the existence of an unbiased estimator of the estimation risk j3j|] . 

In [40], a generalized SURE (GSURE) has been developed for noise models 
within the multivariate canonical exponential family. This allows one to derive 
an unbiased estimator of the risk on a projected version of xg(y), which in 
turn covers the case where $ has a non-trivial kernel and a part of xo is in it. 
Indeed, in the latter scenario, the GSURE can at best estimate the projection 
risk E^djnJgd/) — n^o)!!^) where n is the orthogonal projector on ker($)- L . 

1.6. Contributions 

This paper describes the following contributions: 

(i) Local affine parameterization: we show that any solution x\(y) of 
^a(z/)| ) is a piecewise affine function of (y, A). Furthermore, for fixed A, 
and for y outside a set of Lebesgue measure zero, the prediction n\{y) 
locally varies along a constant subspace. This is a distinctly novel contri- 



bution which generalizes previously known results (see Section 4.1 for a 
detailed discussion). It also forms the cornerstone of unbiased estimation 
of the DOF. 

(ii) GSURE: we derive a unifying framework to compute unbiased estimates 
of several risks in £ 2 sense, for estimators of xo from y as observed in 
([!]) when w is a white Gaussian noise. This framework encompasses for 
instance the prediction, the projection and the estimation risks (see Sec- 
tion [O] for a discussion to related work). 



(iii) ^-Analysis Unbiased Risk Estimation: combining the results from 
the previous two contributions, we derive a closed-form expression of an 
unbiased estimator of the DOF for ( ^(j/)! , whence we deduce GSURE 
estimates of the different risks. 
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(iv) Numerical Computation of GSURE: we also address in detail numer- 
ical issues that rise when implementing our DOF estimator and GSURE 
f° r p-'xjy)^ ■ We show that the additional computational effort to compute 
the DOF estimator (hence the GSURE) from its closed-form is invested 
in solving simple linear systems. This turns out to be much faster than 
iterative approaches existing in the literature which are computationally 



demanding (see Section 4.4 for a detailed discussion) 



1.7. Organization of the Paper 

The rest of the paper is organized as follows. Section [2] and [3] describe each 
of our main contributions. Section 0] draws some connections with relevant 
previous works. Section [5] illustrates our results on some numerical examples. 
The proofs are deferred to Appendix [X] awaiting inspection by the interested 
reader. 



1.8. Notation 

We first summarize the main notations used throughout the paper. We focus 
on real vector spaces. The sign vector sign(a) of a € K p is 



Vi e {!,..., P}, sign(a) 4 = < 



+ 1 


-1 



if an > 0, 
if Qfj = 0, 
if <X; < 0. 



Its support is 



supp(a) = {*€{!,..., P}\ai^0}. 



For a subset I C E, \I\ will denote its cardinality, and I c = E\I its complement. 

The matrix Mj for J a subset of {1, ... , P} is the submatrix whose columns 
are indexed by J. Similarly, the vector sj is the restriction of s to the entries 
of s indexed by J. 

tr and div are respectively the trace and divergence operators. The matrix 
Id is the identity matrix, where the underlying space will be clear from the 
context. For any matrix M, M + is its Moore-Penrose pseudoinverse and M* is 
its adjoint. 



2. Perturbation Theory of ^-Analysis Regularization 

Throughout this section, it is important to point out that we only require 
that the noise vector w £ to be bounded. The fact that it could be deter- 
ministic or random is irrelevant here. 
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2.1. Local At 



Parameterization 



Our first contribution derives a local affine parameterization of minimizers of 
( jT^A as functions of (y, A) S K 1 ^ xR + . To develop our theory, the invertibility 
of $ on Qj will play a vital role. For this, we need to assume that 



Ker$n£j = {0}. 



To intuitively understand the importance of this assumption, think of the ideal 
case where one wants to estimate a D-sparse signal x from y — &x + w, whose 
D-cosupport J is assumed to be known. This can be achieved by solving a 



least-squares problem. The latter has a unique solution if (Hj) holds 



Of course, J is not known in general, and one may legitimately ask whether 
\H,j\ is fulfilled for some solution of fP\ (?/)[ ). We will provide an affirmative 
answer to this question in Theorem [2jh) , i.e. there always exists a solution of 



\P\{y)\ such that pTJ]) ho lds 



With assumption (Hj) at hand, we now define the following matrix whose 



role will be clarified shortly. 



Definition 3. Let J be a D-cosupport. Suppose that (Hj) holds. We define 
the matrix as 

T [J] = U(U*<f>*$U)~ 1 U*. (6) 
where U is a matrix whose columns form a basis of Qj. 

Observe that the action of could be rewritten as an optimization problem 

r^ti = argmin - H'&xll 2 — (x, u). 
D*x=0 2 



Let us now turn to sensitivity of the minimizers x\(y) of | fP\(3/)P to per- 
turbations of (y, A). More precisely, our aim is to study properties, including 
continuity and differentiability, of x\{y) and <frx\(y) as functions of y and A. 
Toward this end, we will exploit the fact that x\(y) obeys an implicit equation 
given in Lemma [2] (see Appendix A.2[ ). But as optimal solutions turns out to be 
not everywhere differentiable (change of the Z?-support and thus of the cospace), 
we will concentrate on a local analysis where (y, A) vary in a small neighborhood 
that typically avoids non-differentiability to occur. This is exactly the reason 
why we introduce the transition space % defined below. It corresponds to the 
set of observation vectors y and regularization parameters A where the cospace 
Q j of any solution of ( V\ (y) I is not stable with respect to small perturbations 
of (y,X). 

Definition 4. The transition space % is defined as 



n 



u 



u 



■TCjl," ,P} KCJ Sj 

fHj} holds Imn [J1 gIm_D. 7 \ A - 



U U n ^ 

,e{-i,i}i J0 i o-Kef-i,!} 1 * 1 
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where 

Hj, k , Sjc ,„ k = {(y, A) eR Q xR + \ Pg JXK & J k = Pg JXK X(n^ Sj . - D K a K )) , 

with fl[ J ] = $*($rl j ]$* - Id), f2W = ($*$r[ J l - ld)Djc and Pg jyK is the 
orthogonal projector on Qj\k- 

The following theorem summarizes our first sensitivity analysis result on the 
optimal solutions of \P\{y)\ - 

Theorem 1. Let (y, A) ^ H and let x\(y) be a solution of (y)\ . Let L and 
J be the D-support and D-cosupport of x\{y) and s = sign(D*x\(y)). Suppose 
that (Hj) holds. For any y £ R® and X £ R + , define 

x\{y) = r[ J ]$*y-Ari J i£> /S/ . 

There exists an open neighborhood B C R® x R + of (y, A) such that for every 
(y, A) £ B, xt(y) is a solution of (V\(y)). 

An immediate consequence of this theorem is that, for a fixed y £ MP, if 
CP\{y)\ admits a unique solution x\(y) for each A, then {x\{y) : A £ R + } 



identifies a polygonal solution path. As we move along the solution path, the 
cospace is piecewise constant as a function of A, changing only at critical values 
corresponding to the vertices on the polygonal path. 

2.2. Local Variations of the Prediction 

We now turn to quantifying explicitly the local variations of the prediction 
/j,\(y) = &x* x (y) with respect to the observation y. First, it is not difficult to 
see that even if ( 1^(^)1 ) admits several solutions, all of them share the same 
image under <£>; see Lemma [4] for a formal proof of this assertion. This allows 
to denote without ambiguity H\(y) as a single- valued mapping. Before stating 
our second sensitivity analysis result, we need to define the restriction to R® of 
the transition space T-L. 

Definition 5. Let A £ R* + . The A-restricted transition space is 
U..x 



{y£RQ\(y,X)£H} . 



Theorem 2. Fix A £ R* + . Then, 

(i) The X-restricted transition space W. t \ is of Lebesgue measure zero, 
(ii) For y ^ H. t \, there exists x\{y) a solution of §P\{y)\ with a D-cosupport 



J that obeys (Hj I 



(Hi) The mapping y i— > n\{y) is of class C°° onR^\TL.,\ (a set of full Lebesgue 
measure), and 



9^* x (y) 

dy 

where J is such that (Hj) holds. 



= $r [J1 $* , (7) 
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3. Generalized Stein Unbiased Risk Estimator 



Throughout this section, for our statements to be statistically meaningful, 
the noise is assumed to be white Gaussian, w ~ A/"(0, er 2 Idg) of bounded variance 



3.1. GSURE for an Arbitrary Estimator 

We first consider an arbitrary estimator xg{y) with parameters 9 such that 
jUe(y) = &xg(y) is a single- valued mapping. We similarly write /j,q = $xo. Of 
course the results described shortly will apply when the estimator is taken as 
any minimizer of fPx(y)| ), in which case 6 = A. 

We here develop an extended version of GSURE that unbiasedly estimates 
the risk of reconstructing A^iq with an arbitrary matrix A € IR Mx< 2. This allows 
us to cover in a unified framework unbiased estimation of several classical risks 
including the prediction risk (with A = Id), the projection risk when $ is rank 
deficient (with A = $*($$*) + ), and the estimation risk when $ has full rank 
(with A = $ + = ($*$) _1 $*). A quantity that will enter into play in the risk 
of estimating A/i is the degrees of freedom defined as 

Af A _ v cov w ((Ay) i: (Afi 9 (y))i) 

i=l 

Definition 6. Let A £ R Mx ® . We define the Generalized Stein Unbiased Risk 
Estimate (GSURE) associated to A as 

GS\JRE A (xg(y)) = \\A(y - Mv))\\l ~ °* tT ( A * A ) + 2 ^h{v) » 

where 

dfj(y) = tr(A d -t0lA< 

Unbiasedness of the GSURE. The next result shows that GSURE A (x6>(?/)) is 

an unbiased estimator of an appropriate £ 2 risk, and dfg (y) is an unbiased 
estimator of dff 

Theorem 3. Let A € R Mx< 2. Suppose that y i-> /2g(y) is weakly differentiable. 
so that its divergence is well-defined in the weak sense. If y = $x + w with 
Af(0,a 2 ld Q ), then 



E w GS\JKE A (x e (y)) = E w [\\A^ Q - Afle{y)\\ 2 ) and E w df g (y) = dff . 

Remark 1. Theorem [3] can be straightforwardly adapted to deal with any white 
Gaussian noise with a non-singular covariance matrix S. It is sufficient to 
consider the change of variable y H> £ _1 / 2 y and <& i-> S _1 / 2 $. This is in the 
same vein as 1401. 
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All estimators of the form GSURE S with B such that B<& = A$ share the 
same expectation given by Theorem[3j Hence, there are several ways to estimate 
the risk in reconstructing AfiQ. For the estimation of the prediction, projection 
and estimation risks, we now give the corresponding expressions and associated 
estimators (with subscript notations) as direct consequences of Theorem [3j 

• A = Id: in which case GSURE Id becomes 

GSURE*(x e (j/)) = \\y - My)\\l - Q° 2 + 2<x 2 tr (j^^ 

which provides an unbiased estimate of the prediction risk 

Risk$(xo) = E„, \\$x e (y) - ®x \\l ■ 
This coincides with the classical SURE defined in 

• A = $*($$*)+; when $ is rank deficient, II = $*($$*)+$ is the orthog- 
onal projector on ker($)^ = Im($*). Denoting xml{v) — (&&*) + y the 
maximum likelihood estimator (MLE), GSURE** { *** )+ becomes 

GSUREn(^(y)) = ||a;ML(y)-nS e (2;)||^ ( 7 2 tr(($$*)+) 



+2a 2 tr ($$ . 

It provides an unbiased estimate of the projection risk 
Risk n (a;o) = E w \\Ux g (y) - Ux \\l ■ 

If <& is the synthesis operator of a Parseval tight frame, i.e. $<!>* = Id, the 
projection risk coincides with the prediction risk and so do the corresponding 
GSURE estimates 

Riskn(zo) = Risk$(a; ) and GSURE n ($<j(2/)) = GSURE$(x (y)) . 

It is also worth noting that if xg(y) never lies in ker($), then Riskn(:ro) 
coincides with the estimation risk up to the additive constant ||(Id — II)xo|| 2 . 

• A = in this case $ has full rank, and the mapping y i-> x$(y) is 

single-valued and weakly differcntiable. The maximum likelihood estimator 
is now XMh(y) = , and GSURE < -***'~ 1 ** takes the form 

GSUREi d (x e (y)) = \\xuUy) - Mv)\\l ~ °* tr 

-idx e (y)" 



+2aHr 

This is an unbiased estimator of the estimation risk given by 
Riski d (xo) = E w \\x s (y) - x \\ 2 ■ 
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Reliability of the GSURE. We now assess the reliability of the GSURE by 
computing the expected squared-error between GS\JRE A (xg(y)) and the true 
squared-error on A(xq 

SE A (x e (y)) = \\Afx -A-fi e (y)\\l ■ 
Theorem 4. Under the assumptions of Theorem^ we have 



(GSURE A (x e (y)) - SE A (x e (y))Y 
2a 4 tr[{A*A) 2 ] 
+ Aa 2 E w \\A*A(^ -My))\\ 2 2 



4ct 4 E„, tr 



A °1M A * A 

oy 



2Id 



dfiejy) 
dy 



A* 



3.2. GSURE for t l -Analysis Regularization 

We now specialize the previous results to the case where the estimator xg (y) 
is a solution of \P\{y)\ ; i.e. xg{y) — x\{y) and j2e(y) — H\(y). For notational 
clarity and to highlight the dependency of dim(^j) on y, for y ^ 'H■,\^ we write 
d(y) = dim(Cfj) where J is the Z}-cosupport of any solution x\(y) such that \Hj\ 
holds. We then obtain the following corollary as a consequence of Theorems [2] 
and[3l 



Corollary 1. Let y = $xo 

differentiate and 



w with w ~ A/"(0, o~ 2 \&q). Then n\{y) is weakly 



\v-vl{y)\\ 



Qa 2 + 2a 2 d(y), 

2 



GSURE* (y)) 

GSURE n (^(y)) = \\x ML (y) - Uxl(y)f 2 - o 2 tr(($$*) + ) + 2a 2 tr(rFf[ J ]), 

GSURE Id (4(y)) = \xmXv) ~ x\{y)\\l ~ °* tr(($*$)-i) + 2a 2 tr(rW) . 

Moreover, d(y) is an unbiased estimator of the DOF of \P\{y)\ , i.e. 

df x = df{ d =E w d(y). 

In particular, this result states that dim((?j) is an unbiased estimator of the 
DOF of ( 1^(^)1 response without requiring any assumption to ensure unique- 
ness of x\(y). This DOF estimator formula is valid everywhere except on a set 
of (Lebesgue) measure zero. 

Building upon Theorems [2] and |4j we derive the relative reliability of the 
GSURE for \P\ {y)\ , and show that it decays with the number of measurements 
at the rate 0(l/Q). 

Corollary 2. Let A e R Mx Q and y = $x + w with w ~ Af(0, er 2 Id Q ). Then 
^ GSURE- 4 (x * x (y)) - SE A (x\ (y))\ 2 



Qo- 2 



= O 



Q 



where \\A\\ is the spectral norm of A. In particular, if\\A\\ is independent ofQ, 
the decay rate of the relative reliability is 0(\/Q). 
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3.3. Numerical Considerations 

The remaining obstacle faced when implementing the GSURE formulae of 
Corollary[l]is to compute the divergence term, i.e. the last trace term as given by 
dfxiv) = tr(A$rI J ]$*A*) ( see Definition 6k. However, for large scale-data as in 
image and signal processing, the computational storage required for the matrix 
in the argument of the trace would be prohibitive. Additionally, computing 
can only be reasonably afforded for small data size. Fortunately, the structure 

of df ^ (y) and the definition of allows to derive an efficient and principled 
way to compute the trace term. This is formalized in the next result. 

Proposition 1. One has 

dft(y)=E z ((p(Z),$*A*AZ)) (8) 

where Z ~ 7V(0, Idp), and where for any z G M. p , v = v{z) solves the following 
linear system 

Dj\ (v\ _ f$*z\ (g) 



Dj J \v) \ 



In practice, the empirical mean estimator is replaced for the expectation in 
([8|, hence giving 

I 5>(*0> &A'Az<) ™ dff(y) , (10) 
i=l 

for k realizations Zj of Z, where WLNN stands for the Weak Law of Large 
Numbers. Consequently, the computational bulk of computing an estimate of 

^ A 

dfx (y) i s invested in solving for each v{zi) the symmetric linear system ^ using 
e.g. a conjugate gradient solver. 



4. Relation to Other Works 



4-1. Local variations 

The local behavior of x\(y) as a function of A is already known in the l 1 - 
synthesis case, both for the case where $ is full rank [171 HH] j and Q < N [5D] . 
Our local affine parameterization in Theorem [l] generalizes these results to the 
analysis case regardless of the number of measurements. Our result also goes 
beyond the work of [21] which investigates the overdetermined case with an 
^-analysis regularization and develops a homotopy algorithm. 

4-2. Degrees of freedom 

In the synthesis overdetermined case with full rank $, [H] showed that 
the number of nonzero coefficients is an unbiased estimate for the degrees of 
freedom of ( [Pa(2/)| I • This was generalized to an arbitrary <!> in [42] . Corollary [l] 
encompasses these results as special cases by taking D = Id. 
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For the ^-analysis regularization with full rank <£>, Tibshirani and Taylor 
PT] showed that dfx — E w dim((?j), where J is the D-cosupport of the unique 
solution to ( 1^(^)1 ) • This is exactly the assertion of Corollary [I] since ( Hj ) is 
in force when rank($) = N. 

While a first version of this paper was submitted, it came to our attention 
that Tibshirani and Taylor [33] Theorem 3] recently and independently devel- 
oped an unbiased estimator of the DOF for ( ^(y)! ) that covers the case where 
Q < N. More precisely, they showed that dim($(Cf,/)) is an unbiased estimator 
of df(X), where J is the D-cosupport of any solution to \P\(y)\ - This coincides 
with Corollary [I] when J satisfies (Hj). Their proof however differs from ours, 



and in particular, its does not study directly the local behavior of x* x (y) as a 
function of y or A (Theorem [T]) . 

4-. 3. Generalized Stein Unbiased Risk Estimator 

In [30J , the author derived expressions equivalent to GSUREn and GSUREid 
up to a constant which does not depend on the estimator. However, her expres- 
sions were developed separately, whereas we have shown that these GSURE 
estimates originate from a general result stated in Theorem [3] Another distinc- 
tion between our work and [30] lies in the assumptions imposed. The author [30] 
supposes xg(y) to be a weakly differentiable function of $*y/cr 2 . In contrast, 
we just require that the prediction y i— > fie(y) (a single- valued map) is weakly 
differentiable, as classically assumed in the SURE theory. 

Indeed, let u = $>*y/a 2 , and define xg(y) = Zg(u). Assume that u Zg(u) 
is weakly differentiable (and a fortiori a single- valued mapping). 

When $ is rank deficient, [30] proves unbiasedness of the following estimator 
of the projection risk 

GSURE[f ldar) (4( u )) = \\Ux \\l + \\Uz* e (u)\\l - 2(z* e (u), x ML (y)) 

r 9Zg( u T 



2tr|II- 

du 

Since by assumption d ^g a } u " > = $ dZ Q^ , and using the chain rule, the following 
holds 

a 2 tr ( =aHr ( = tr ( 



dy J \ du dy J \ du 

whence it follows that 

GSUREn (Mv)) ~ GSURE^ EldaT) (^(y)) = \\x ML (y)\\ 2 2 - \\Ux \\ 2 2 

-cr 2 tr(($$*)+) . 

A similar reasoning when <!> has full rank leads to 

GSURE Id (£ e (y)) - GSUREj ( f dm '\x e (y)) = \\xmM\1 ~ \\ x o\\l 

-cr 2 tr(($*$)- 1 ) . 
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Both our estimators and those of [JD] are unbiased, but they do not have 
necessarily the same variance. Given that they only differ by terms that do 
not depend on x$(y), and in particular on the parameter (here 9), selecting the 
latter by minimizing our GSURE expressions or those of [ID] is expected to lead 
to the same results. 

Let us finally mention that in the context of deconvolution, GSUREn boils 
down to the unbiased estimator of the projection risk obtained in (44] . 

4-4- Numerical computation of the GSURE 

In least-squares regression regularized by a sufficiently smooth penalty term, 
the DOF can be estimated in closed-form |45j . However even in such simple 
cases, the computational load and/or storage can be prohibitive for large-scale 
data. 

To overcome the analytical difficult for general non-linear estimators, when 
no closed-form expression is available, first attempts developed bootstrap-based 
(asymptotically) unbiased estimators of the DOF [25] . Ye [46] and Shen and 
Ye [57] proposed a data perturbation technique to approximate the DOF (and 
the SURE) when its closed-form expression is not available or numerically ex- 
pensive to compute. For denoising, a similar Monte-Carlo approach has been 
used in [48] where it was applied to total-variation denoising, wavelet soft- 
thresholding, and Wiener filtering/smoothing splines. 

Alternatively, an estimate can be obtained by recursively differentiating the 
sequence of iterates that converges to a solution of the original minimization 
problem. Initially, it has been proposed by [35], and then refined in [35], to 
compute the GSURE of sparse synthesis regularization by differentiating the se- 
quence of iterates of the forward-backward splitting algorithm. We have recently 
proposed a generalization of this methodology to any proximal splitting algo- 
rithm, and exemplified it on i 1 -analysis regularization including the isotropic 
total-variation regularization, and i 1 — £ 2 synthesis regularization which pro- 
motes block sparsity [50] . 

In our case, we have shown that the computation of a good estimator of the 
DOF, and therefore of GSURE' 4 for various risks, boils down to solving linear 
systems. This is much more efficient than the previous general-purpose iterative 
methods that are computationally expensive. 



5. Numerical Experiments 



In this section, we exemplify the usefulness of our GSURE estimator which 
can serve as a basis for automatically tuning the value of A. This is achieved by 
computing, from a single realization of the noise w ~ Af(0, c 2 Id), the parameter 
A that minimizes the value of GSURE when solving (V\(y) ) from y = + w 
for various scenarios on <E> and xq. 
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(b) x* x (y) at the optimal A 



(c) xo 




4 1 , , , , 

2 4 6 8 10 

Regularization parameter X 

(d) 



Figure 1: Illustration of the selection of A by minimizing GSUREn in a super-resolution 
problem (Q/N = 0.5) with anisotropic total variation regularization. (a) The observed image 
y. (b) A solution x* x (y) of \P\ (y)\ at the optimal A (the one minimizing GSUREn). (c) The 
underlying true image xq. (d) Projection risk Riskn and its GSUREn estimate obtained from 
| |10| l using k = 1 random realization. 

5.1. Computing Minimizers 

Denoising. Although it is convex, solving problem ( j^A i s rather challenging 
given its non-smoothness. In the case where $ = Id, the objective functional 
of \V\{y)\ is strictly convex, and one can compute its unique solution x\(y) by 
solving an equivalent equivalent Fenchel-Rockafellar dual problem [5T] 

x\(y) = y + Da\(y) where Oi\{y) £ Argmin ||y + Z>Q!|| 2 . 

ll"lloo^ A 

This dual problem can be solved using e.g. projected gradient descent or a 
multi-step accelerated version. 

General Case. The proximity operator of x <— > \\D*x\\ 1 is not computable in 
closed- form for an arbitrary dictionary D. This precludes the use of popular 
iterative soft-thresholding (actually the forward-backward proximal splitting) 
without sub-iterating. We therefore appeal to a more elaborate primal-dual 



1G 




Figure 2: Illustration of the selection of A by minimizing GSUREn in a compressed sens- 
ing problem (Q/N = 0.5) by an ^-analysis regularization in a shift-invariant Haar wavelet 
dictionary, (a) The MLE iml- (b) A solution x* x (y) of jT^Afa)! at the optimal A (the one 
minimizing GSUREn). (c) The underlying true image xq. (d) Projection risk Riskn and its 
GSUREpi estimate obtained from l |10| using k = 1 random realization. 

splitting algorithm. We use in our numerical experiments the relaxed Arrow- 
Hurwicz algorithm as revitalized recently in |52j . This algorithm achieves full 
splitting where all operators are applied separately: the proximity operators of 
g *—> 7} \\y — g\\1 and u *— > A ||u|| x (which are known in closed- form) , and the linear 
operators $ and D and their adjoints. To cast ( | 7^ (?/)[ ) in the form required to 
apply this scheme, we can rewrite it as 

F(g,u) = \\\y-g\\l + \\\u\\ 1 
K{x) = (<Px,D*x). 

Note that other algorithms could be equally applied to solve ( | "Pa (?/)[ ) , e.g. [551 



min F(K(x)) where , , 



5.2. Parameter Selection using the GSURE 

Super-Resolution with Total Variation Regularization. In this example, $ is a 
vertical sub-sampling operator of factor two (hence Q/N = 0.5). The noise 
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level has been set such that the observed image y has a peak signal-to-noise 
ratio (PSNR) of 27.78 dB. We used an anisotropic total variation regularization; 
i.e. the sum of the £ 1 -norms of the partial derivatives in the first and second 
direction (not to be confused with the isotropic total variation). Fig.[T]d depicts 
the projection risk and its GSUREn estimate obtained from ( |To| ) with k = 1 
as a function of A. The curves appear unimodal and coincide even with k = 1 
and a single noise realization. Consequently, GSUREn provides a high-quality 
selection of A minimizing the projection risk. Close-up views of the central parts 
of the degraded, restored (using the optimal A), and true images are shown in 
Fig. [TJa)-(c) for visual inspection of the restoration quality. 

Compressed Sensing with Wavelet Analysis Regularization. We consider in this 
example a compressed sensing scenario where $ is a random partial DCT mea- 
surement matrix with an under-sampling ratio Q/N = 0.5. The noise is such 
that input image y has a PSNR set to 27.50 dB. We took D as the shift-invariant 
Haar wavelet dictionary with 3 scales. Again, we estimate GSUREn with k = 1 



in (10 1. The results observed on the super-resolution example are confirmed 



in this compressed sensing experiment both visually and qualitatively, see Fig. [2j 



6. Conclusion 



In this paper, we studied the local behavior of solutions to ^-analysis reg- 
ularized inverse problems of the form fPx(2/)| ) • We proved that any minimizer 
x \{v) °f CP\(y) I is a piecewise affine function of the observations y and the 



regularization parameter A. This local affine parameterization is completely 
characterized by the Z?-support / of x\ (y), i.e. the set of indices of atoms in D 
with non-zero correlations with x* x (y). As a byproduct, for y outside a set of 
zero Lebesgue measure, the first-order variations of $>x\(y) with respect to y 
are obtained in closed-form. 

We capitalized on these results to derive a closed-form expression of an un- 
biased estimator of the degrees of freedom of p-'xjy)^ , and to objectively and 
automatically choose the regularization parameter A when the noise contami- 
nating the observations is additive-white Gaussian. Toward this goal, a unified 
framework to unbiasedly estimate several risk measures is proposed through 
the GSURE methodology. This encompasses several special cases such as the 
prediction, the projection and the estimation risk. A computationally efficient 
algorithm is designed to compute the GSURE in the context of ^-analysis re- 
construction. Illustrations on different imaging inverse problems exemplify the 
potential applicability of our theoretical findings. 

A. Proofs 

Throughout, we use the shorthand notation C Vl \ for the objective function 
in fhJjfil 

£ ytX (x) = ^\\y-$xf 2 + \\\D*x\\ 1 . 
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We remind the reader that condition ( Hq I is supposed to hold true in all our 
statements. 



A.l. Preparatory lemmata 

The following key lemma will be central in our proofs. It gives the first 
order necessary and sufficient optimality conditions for the analysis variational 
problem $P\(y)\ . 

Lemma 1. A vector x\(y) is a solution of §P\{y)\ if, and only if, there exists 
a G Kj J l, where J is the D-cosupport of x\(y), such that 

*€Z y , x (xl(y)) (11) 

with 

z y ,\{xi(y)) = {° g K ' J| \ - v) + xd i s i + xd j° = o 

and |HL<l}, (12) 

where I = J c is the D-support of x\(y) and s — sign(_D* x\(y)) . 

Proof. The subdifferential of a real-valued proper convex function F : R w — > 
K U {oo} is denoted OF. From standard convex analysis, we recall of dF at a 
point x in the domain of F 

dF(x) = {g € R N \Vz E R N , F(z) ^ F(x) + (g, z - x)} . 

It is clear from this definition that x\(y) is a (global) minimizer of F if, and 
only if, € dF{x). By classical subdifferential calculus, the subdifferential of 
£y,\ at x is the non-empty convex compact set 

dC VtX {x) = {$*($£ -y) + XDu \utR N : Ui = siga(D*x)i and WujW^ ^ l} . 

Therefore € dC y \(x\ (y)) is equivalent to the existence of u € Mr such that 
ui = sign(D*x* x (y)) I and H^H^, < 1 satisfying 

$*($.T^(y)-y) + AL>u = 0. 

Taking a = uj, this is equivalent to a € Y, y \(x\\y)\ □ 

The following lemma gives an implicit equation satisfied by any (non neces- 
sarily unique) minimizer x\(y) of fPx(2/)| )' 

Lemma 2. Let x\(y) be a solution of ^P\{y)\ - Let I be the D-support and J 
the D-cosupport of x\{y) and s = sign(£>* x\{y)) . We suppose that (Hj) holds. 
Then, x\(y) satisfies 

xl(y)^r^^y~XT^D lSl . (13) 
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Proof. Owing to the first order necessary and sufficient optimality condition 
(Lemma|lj, there exists a € ^y,\(x\(y)) satisfying 

<f>*($xl(y)-y) + \D I s I + \D J a = 0. (14) 

By definition, x* x (y) € Qj = (ImDj) 1 . We can then write x x (y) — Ua for some 
a e M dim ( e -'). Since U*Dj = 0, multiplying both sides of §H§ on the left by 

U* , we get 

[/*$*($£/« -y) + XU*Disi = 0. 

Since [/*<!?* is invertible, the implicit equation of x\(y) follows immediately. 

□ 

Suppose now that a vector satisfies the above implicit equation. The next 
lemma derives two equivalent necessary and sufficient conditions to guarantee 
that this vector is actually a solution to ( | "Pa (?/)[ )■ 



Lemma 3. Let y € R^, let J a D-cosupport such that (Hj) holds and let 
I = J c . Suppose that x\(y) satisfies 

x* x (y) =rM$*y - XT^D lSl . 



where s — sign(D* x* x (y)) . Then, x\{y) is a solution of (V\(y) \ if and only if, 
there exists a € R' J ' satisfying one of the following equivalent conditions 

a -n [J] Sl + ^U [J] y EKciDj and < 1, (15) 

A 

or 

U [J] y- X& [J] Si +XDja = and < 1, (16) 

where ftM = ($*$rl j l - Id)L> 7 , ftM = $*($rl j l$* - Id), fl^ = D+fl^ an d 

ni J i = D+& J l 

Proof. First, we observe that x\(y) € Qj. According to Lemma [Tj x\(y) is a 
solution of (V\{y)) if, and only if, there exists a £ T, y _\(x x (y)). Since ( |ijj| ) 
holds, r[ J l is properly defined. We can then plug the assumed implicit equation 
in |l2]) to get 

$*($r [J] $*y - \<PT [J] D ISI -y) + \D ISI + \D.ja = 0. 
Rearranging the terms multiplying y and s/, we arrive at 

$*($r [J1 $* - Id)y - A($*$r [J1 - Id)D lSl + XDjo- = 0. 

This shows that x\(y) is a minimizer of ( 1^(^)1 ) if, and only if 

fl [J] y - XCl [J] Sl + XDjo- = and < 1. 

To prove the equivalence with (16), we first note that t/*i¥ J J = im- 
plying that Im(r2[ J l) C Im(Dj). Since DjD^j is the orthogonal projector on 
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Im(Dj), we get Q.W = DjD+fiM = D,0 J l With a similar argument, we get 
nl J ] = DjU^. Hence, the existence of a £ Y, yi \(x\(y)) such that WaW^ ^ 1 is 
equivalent to 

Dja = DjQW Sj - -DjJI^y where < 1, 

which in turn is equivalent to 

a - n [J] Sl + \u [J] y £ Kcr Dj where IHI^ ^ 1. 
A 

□ 

We now show that even if uPa(j/)| admits several solutions X\(y), all of 
them share the same image under $, which in turn implies that y h- ^ f-xiv) * s a 
single- valued mapping. 

Lemma 4. If x\ and X2 are two minimizers of \P\(y)\ , then <&x\ = <&X2- 



Proof. Let x\,X2 be two minimizers of V\(y) Suppose that $xi ^ $X2- Take 

2 
2 



2:3 = px\ + (1 — p)x2, p £ (0, 1). Strict convexity of u n- \\y — u|| 2 implies that 



-Hy-Sarslla < § III/ - **illa + ^ ll» ~ ^lla ■ 

Jensen's inequality again applied to the i 1 norm gives 

\\D*x 3 \\ 1 ^p\\D*x 1 \\ 1 + (l-p)\\D*x 2 \\ 1 . 

Together, these two inequalities yield C yt \(x3) < Cy\(xi), which contradicts 
our initial assumption that X\ is a minimizer of ( I'Pa (^)| ) ■ d 

A. 2. Proof of Theorem^ 

Proof. Let (y, A) ^ W. By construction, the vector xt(y) obeys D*jX±(y) = 0. 
Accordingly, for (y, A) sufficiently close to (y, A), one has 

sign(D*a$(y)) = sign(C*^(y)). 

Since 2;^ is a solution of | [pA(y)| )i using Lemmas [2] and |3j there exists a such 
that 

n [J1 y-Afi [J1 s / + AL> J( T = and < 1. (17) 

Let us split J = KUL, KC\L = $ such that Ho^lL = 1 and HctlII^ < 1. Note 
that o-k e {-1,1} |K| . 

We first suppose that ImfT^ C ImDi. To prove that xtiy) is solution to 
(V\(y)), we show that there exists a such that ||<x|| ^5 1 and 

f[ [JI y - Af) [J1 s, + XDko* + \D L a L = 0. 
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We impose that o~k — o~k and take <tl as 



Hence, 



Il [J] y- Xfl^sr + XDja 
= fl [J] y~\n [J] s I + XD K a K + XD L a L 



-D L Dt±m 



A - A X. , 

— + riy-y) 



= Yl [J] y - Xn [J] Sl + XD K a K + XD L a L 



n [J] (y - y) + (A - A)^[ J l S/ - (A - X)Dk(Jk - (A - X)Dlo~l 



-D T .Dt±m 
A 



A - A A, , 

y+ riy-y) 



x 



A 



Since ImiTW C Im-Dx and D^D^ is the orthogonal projector on Im(£>L), we 
have fiM = D L L>+n[ J ]. It follows that, 

A - A 



A 



U [J] y - X& [J] SI + XD K a K + XD L <j L 



= 0. 



Now, for (y, A) close enough to (y, A), we have 



A 



a y+ \^ y ~ y ^ 



^ i. 



whence we deduce that is a solution of ("P> (?/))■ 

In fact, for (y, A) ^ "H, we inevitably have ImfiM C ImD^. Indeed, pro- 
jecting (17 1 on Gl gives 

= Pg L (tl [J] y - Xn [J] Sl + XDjaj = Pg L (tl [J] y - XCl [J] Sl + XD K a K 

or equivalently 

V Q J\ {A y = Pg L X(n^ Sl - D K a K ). 

If ImflM g Im£>L, then (y, A) G H j,K,sr,a K , a contradiction. This concludes 
the proof. □ 

^4.5. Proof of Theorem^ 

Proof of (i). First it is easy to see that Hj,k,s ja,a K in Definition [4] is a vector 
subspace of R* 3 x R. Moreover Uj >K>Sjet<JK C kerP 5jVK B, where £ = [ftM - 

fiMsjc +D K <7 K ]. 
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Now, fix A. %. a is included in 



«*= u u u u * 

./ ! I ••• ,P} A"CJ Sjce {_i i i}U c l crKe {_l ! l}|K| 

{777} holds Imn [J ](ZIm_D JVJC 



J,K,Sjc ,(T K ) 



where 

^W JC ,^ = {2/ e m q \ P e . AK n [J1 y = P SAK A(f2M Sjc + % ffjr )} , 

bince is an affrne subspace of H.** with 

dim('Hj Ksjc aK ) = dim(kerPg JVif n[ J ]) < Q, where the inequality follows from 
the rank-nullity theorem and the fact that Gj\k is a (nonempty) strict subspace 
of MP . Given that T-L x is a finite union of subspaces 1-0) K SJC aK all strictly in- 
cluded in MP, W x has a Lebesgue measure zero and so does H. t x Q n- 

Note that with a similar reasoning, one can show that H is also of zero 
Lebesgue measure using the fact that B <2 lmDj\ K if ftM <2 lm.Dn K since 
ImPcimB. □ 

Proof of (ii). The proof of this statement is constructive. Denote by M.\{y) the 



set of minimizers of ( V\ (y) ) . To lighten the notation, we drop the dependence 
on y and A from x* x {y) G M\(y). 

First step. We prove the following statement 

( (x* G M\(y) A -.(i? SU pp(£.*x*)0) 

=>■ G M A (y) A supp(I5*z**) C suppp***)), 

wher A and -i are respectively the logical conjunction and negation symbols. 
In plain words, let x* be a solution of \V\{y)\ - Suppose (Hj) docs not hold 



where J is the Z?-cosupport of x*. We prove that there exists a solution x*^{y) 
of .D-support strictly included in I = J c 



Since (Hj) does not hold, there exists z G Ker$ with z; / and D}z = 0. 
We define for every t G M, the vector u t = + te. Denote i3 the subset of R 
defined by 

8={tet \ sign(D*i; t ) = sign(£>*a;*)} , 
B is a non-empty convex set and G B. Moreover for all t G B, d£ Vt \(vt) = 



dC Vi \{x*). It then follows from Lemma [T tl 
|^a(j/)])- As a consequence, using Lemma 4j 



that for all t G B, v t is a solution of 
we get 



Vie6, $v t = $x* and H-D*^^ = H^a;*^ 
Since lim IID*^!!, = +oo, the set B is bounded. It is also an open set as a 

|t|-K» 

finite intersection of P open sets corresponding to the solutions to sign((D*x*)i+ 
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tzi) = sign((D*a;*)i)). Hence, B is an open interval of K which contains 0, i.e. 
there exist t\,to GR such that 

B =]ti,to[ where — oo < t\ < and < t < +oo. 

Since to $ B, the D-support of v to is strictly included in I. Moreover by 
continuity, 

$ Wt0 =$x* and |pX Hi = ll-O^li- 



Hence, v t „ is a solution of (V\(y) ) of Z?-support strictly included in I. 

Second step. We now prove our claim, i.e. 

3x* G M\{y) such that (-ff S u PP (,D*iE*)<=) holds. 

Consider {x*^, . . . , x* p+1 - ) ) G (M\(y)) p+1 such that for every i G {1, . . . , -P+l}, 
the condition (ifjj does not hold for J, = supp(D*a;*^) c and Ji D J 2 3 • • • 3 
Jp+i. Then, we have a strictly increasing sequence of P+l subsets of {1, ... , P} 
which is impossible. Hence, according to the first step of our proof, there exists 
i G {1, . . . , P + 1} such that (Hj.) holds. □ 

Proof of (Hi). By virtue of statement (ii), there exists a solution x\{y) of 
I fPxCy)! such that (Hj) holds. Let us consider this solution. Using Theorem [T] 
for y close enough to y, we have 

§x\{y) = $r [J] $*y - \<f>T [J] D lSl . 

where J is the £>-cosupport of x\(y). Since / (hence J) and s/ are locally 
constant under the assumptions of the theorem, so is the vector X&T^Disi, it 
follows that n* x (y) = <frx* x (y) can be written as 

whence we deduce 



9y 

Moreover, owing to statement (i), this expression is valid on \ H.,\, a set of 
full Lebesgue measure. □ 

A. 4- Proof of Theorem^ 

We First recall Stein's lemma whose proof can be found in [27] . 

Lemma 5 (Stein's lemma). Let y = $xo + w with w ~ A/"(0, o- 2 1Aq). As- 
sume that g : y i-> g(y) is weakly differentiable (and a fortiori a single-valued 
mapping), then 



E w (w, g{y)) = a 2 E w tv 



dg(y) 

dy 
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Let us now turn to the proof of Theorem [3] 

Proof. Since y H ► fig(y) = <frxg(y) is weakly differentiable, so is j4*A/ie(y) and 
we have 

dy dy 

Then, using Lemma [5j we get 



IM™. A*A/z e (y)) = tr^tr I 1 = ^E^/ (y) . 

Using the decomposition Ay — A^xq + Aw, we obtain 

E w \\Ay - Afig(y)\\l = E w \\A$x + Aw\\ 2 2 - 2E w (A<S>x a + Aw, Afi e (y)) 

+ E w \\A$e{y)\\ 2 2 
= E w \\A$x Q \\l + o- 2 tr(A*A) - 2E w (A<S>x a , A%{y)) 

- 2E w (w, A*A-fi e (y)) + E w \\Ali e {y)\\l 
= E w \\A<f>x - Afie{y)\\l 

+ g 1 tr(A*A) - 2a 2 E w dfg(y) . 

Moreover, £\ cov w ((Ay)i, (Ap,e(y))i) = E w (Aw, Afi,g(y)), which shows that 

^ A 

dfg (y) is indeed an unbiased estimator of df A . □ 
A. 5. Proof of Theorem^ 

Proof. Denote by R A the reliability of the GSURE for the estimator xg(y), i.e. 

R A = E w (GSURE A (£ e (y)) - SE A (x e (y))) 2 . 
Let Q (xg(y)) be the quantity defined as 

Q A (xg(y)) = \\AfioWl + \\AMv)\\l ~ 2 ( A V> *Mv)) + 
We have GSURE A (xg(y)) ~ Q A {x e {y)) = \\Ay\\ 2 2 - E w \\Ay\$, where 
E w \\Ay\\ 2 2 = \\A^\\ 2 2 + o- 2 tY(A*A), 
and V™ \\Ayf 2 = 2a 4 (tx[(A*A) 2 } + 2 M!^llf j . 

It results that E w (GSURE A (x g(y)) - Q A (xg(y))) = 0, and hence 
E w (Q A (x e (y))) = E w (GSVRE A (x e (y))) = E w (SE A ) . 
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Remark that Q A (xg(y))-SE A (xg(y)) = 2 (<J 2 dft(y) - (Aw, Aj2 e {v))) ■ We. 
now rewrite the reliability in the following form 

R A =E W (GSVRE A (x e (y)) - Q A {x e (y)) + Q A (Mv)) ~ SE A (x e (y))) 2 
=Y W \\Ay\\ 2 2 +E w (Q A (x e (y))~SE A (x e (y))) 2 + 
4E W (\\Ay\\l(a 2 dft(y)~(Aw, AMv)))) 



=T 



Lemma [5] gives E w (Aw, Aflg(y)) = a 2 E w df g (y), and we get 

T = 2E W ((Aw, Afi a )(a 2 dft{y) ~ (Aw, AMv)))) + 



E, 



Ti 

■ A 



(\\Awf 2 (a 2 df e (y)-(Aw, Afa(y))) 



^ A 

Let H* A (y) = A* Afxg(y), fi A — A*A[i and wa — A* Aw. Observe that df g (y) = 
div n\(y) and Wi (fj, A (y))- is weakly differentiable. Then by integration by parts 
(in the same vein as in the proof of Stein's Lemma [5]), we get 

Ti = 2 ( t 2 ]Te w (^(^M^i) - 2^E W (w i ( J M° A ) i w j f i * A (y) j ) 



i-'j 



and 



T 2 = (T 2 ^E^ (wiiwAh 9 ^ 4 ) - £> w (w l (w A ) l w ] ^ A (y) ) 



In turn, 



Hence, 



i,j 



T 2 = -2a 2 J2^a, ((fJ>* A (y))j( w A)j) = -2a 2 E w (w A , fi\{y)) 
j 

= -2a 2 E w (A*Aw, A*A^(y)) = -2a 4 E w dfg A (y) , 



2G 



where the last equality is again a consequence of Lemma [5j It follows that 



T = -2a 2 
we have 



. A A , 



E w (fi° A , n* A (v)) + cr 2 E w df g (y) . Moreover using [5Sj Property 1] 



e w (Q A (xg(y))-sv A (x e (y))) 2 

= 4E U , (a 2 dwn A (y) - (w, fi* A (y))Y 



4a 2 [E w \\^ A (y)\\ 



Therefore, the reliability is given by 



a 2 E w tr 



g/4(j/) 

dy 



R A = 2aHr[{A* A) 2 ]+Aa 2 E w \U° A - fi A (y)\ 



- 4cr 4 E„ 



tr 



-2dff A (y) 



Rearranging the last term above, we obtain the derived expression 



□ 



A. 6. Proof of Corollary^ 

Proof. Let A e W + . From Theorem J2|hi) , y H> <&x\{y) is differentiable almost 
everywhere and we can invoke Theorem [3] to derive the GSURE expressions. 

We also observe that V = $r[ J ]<E>* is the orthogonal projector on ImV = 
4>(C/j), so that trV^ = dim(ImV) = rank($Pg 7 ). Since $ is injective on Gj 
under (Hj\, it follows that trV = dim(Cfj). Hence, using Theorem [3] with 
A = Id, Theorem |2jn) and Q, it follows that dim((?j) is an unbiased estimator 
ofrf/(A). □ 

A. 7. Proof of Corollary^ 

Proof. As V = $r[ J l$* is the orthogonal projector on we have 

tr [AVA*A{2ld - V)A*\ > . 

Moreover A* A is Hermitian, hence tr [(A* A) 2 ] = we obtain (with the 

notation of Section A. 5 1 the following upper-bound of the reliability 



where 



R A < 2a 4 \\A*A\\ 2 F + Aa 2 \\Af E || Mo - ^ x {y)\\ 2 
is the Frobenius norm and |.| the matrix spectral norm. Then, for 



A £ M Mx< 3, using classical inequalities, we get 

\\A* A\\ 2 F < rank(A) ||A|| 4 = min(M, Q) \\A\\ A ^ Q\\A\ 
Since x\(y) is a solution of \P\{y)\ , we have 

\\\y-^{y)t^C yiX {xl{y))^C y , x {Q) 



1 II II 2 

2 llvlla • 
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Thus, using Jensen's inequality, we get 

E ||mo - t*l(y)\\l < 2E(||/i -y\\l + \\y- fxl(y)\\l) 

< 2E(Hla + NI2) <2(||/io|la + 2 Q^ : 
Altogether, this yields the following upper bound 
R A 



ct*Q< 



18 gJlMojH 



Since ||/xo|| 2 < oo, this concludes the proof. 

A . 8. Proof of Proposition [7] 
Proof. We have 



□ 



tr 



= tr 



Hence denoting v(z) — T^<fr*z, and using the fact that for any matrix U, 
trll = E Z (Z, UZ), we arrive at pj). 

We then use the fact that r'^*, the inverse of <J> on Qj, is the mapping 
that solves the following linearly constrained least-squares problem 

rl j l$*z = argmin \\3>h - z\\\ . 

Writing the KKT conditions of this problem leads to (|9j), where v are the La- 
grange multipliers. □ 
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